Universal expressiveness of variational quantum classifiers and quantum kernels for support vector machines

Machine learning is considered to be one of the most promising applications of quantum computing. Therefore, the search for quantum advantage of the quantum analogues of machine learning models is a key research goal. Here, we show that variational quantum classifiers and support vector machines with quantum kernels can solve a classification problem based on the k-FORRELATION problem, which is known to be PROMISEBQP-complete. Because the PROMISEBQP complexity class includes all Bounded-Error Quantum Polynomial-Time (BQP) decision problems, our results imply that there exists a feature map and a quantum kernel that make variational quantum classifiers and quantum kernel support vector machines efficient solvers for any BQP problem. Hence, this work implies that their feature map and quantum kernel, respectively, can be designed to have a quantum advantage for any classification problem that cannot be classically solved in polynomial time but contrariwise by a quantum computer.

Quantum machine learning (QML) has recently emerged as a new research field aiming to take advantage of quantum computing for machine learning (ML) tasks [1][2][3][4] . It has been shown that embedding data into gate-based quantum circuits can be used to produce kernels for ML models by quantum measurements [5][6][7][8][9][10][11] . Quantum kernels have been used as kernels of support vector machines (QSVM) for classification [12][13][14][15][16][17][18] and Gaussian process models for regression problems 19,20 . Variational quantum circuits have been used to devise variational quantum classifiers (VQC) 5,21,22 . However, for QML to become a new computational paradigm, it is necessary to prove and demonstrate the computational advantage of ML models based on quantum circuits.
Computational problems are classified in computational complexity theory according to the scaling of time and memory requirements in a computational model with the problem size. For example, the classical complexity class P encompasses all decision problems that are solvable on a deterministic Turing machine in time which scales polynomially with the problem size. Analogously, class NP can be defined to encompass problems solvable on a non-deterministic Turing machine in polynomial time. Problems solvable in polynomial time are considered efficient. Hence, decision problems in P are efficiently solvable by classical computers, but it is assumed that this is not the case for problems in NP (P ≠ NP). Problems can further be in special relations to complexity classes. A problem is complete relative to a complexity class, if every problem in this class can be reduced to this problem under an efficient transformation. Another relation is hardness. A hard problem relative to a complexity class is at least as difficult to solve as any problem in this class. Importantly, this implies that hardness is a stronger property than completeness since a hard problem is also complete for a particular class, if it is in this class, but it can be in a hierarchically higher class.
Quantum computing problems are classified by quantum complexity theory 23 . In particular, class BQPbounded-error quantum polynomial timeencompasses decision problems solvable in polynomial time by a quantum Turing machine (the uniform family of polynomial-size quantum circuits), with at most 1/3 probability of error. While BQP includes P, because all efficient classical computations can be performed deterministically using quantum circuits with polynomial depth, BQP is assumed to also include problems that are not in P. This means that BQP-complete problems are not in P.
Otherwise, BQP would be equal to P and there would be no quantum advantage to any quantum computing algorithm. Thus, (it is believed that) BQP-complete problems cannot be solved in polynomial time on a classical computer. The hierarchy and relations of complexity classes relevant for this work are shown in Fig. 1.
To demonstrate quantum advantage of QSVM, Liu et al. 18 considered the DISCRETE LOGARITHM PROBLEM (DLP). The problem is to find the logarithm x = log g y in a multiplicative group of integers modulo prime p (denoted as Z * p ) for a generator g, i.e., such that g x y ðmod pÞ. DLP is believed, but not rigorously proven, to be unsolvable with polynomial time in the number of bits n = dlog 2 pe on a classical computer. Furthermore, only computing the most significant bit of x = log g y for the 1 2 + 1 polyðnÞ fraction of x 2 Z * p is as hard as solving DLP 18,24 . This forms a decision problem (DLP 1/2 ), presumed to be in NP, which was adopted by Liu et al. 18 into a classification task to prove separation between QSVM and classical ML classifiers. Given that DLP 1/2 is in NP (as shown in Fig. 1 by the square), it can be argued that DLP 1/2 cannot be a BQPcomplete problem 25 . Therefore, one cannot generalize the results of Liu et al. 18 to arbitrary problems in BQP.
In the present work, we show that VQC and QSVM can solve a problem that is complete in a hierarchically higher class in relation to BQPnamely, PROMISEBQP. As such, our results imply that there exists a quantum kernel or a feature map that makes VQC and QSVM efficient solvers for any problem with BQP complexity.

Results
We use the k-FORRELATION problem that is proven to be PROMISEBQPcomplete 26 . As defined and described in detail in the Methods section, the k-FORRELATION problem considers k Boolean functions f 1 , …, f k : with x Á y = P n i = 1 x i y i . We first introduce a classification problem based on the k-FORRELATION promise problem including a compact data encoding scheme. Correctly classifying such a data set requires an algorithm with PROMISEBQP-complete complexity.
We then show that this classification problem can be solved efficiently and with arbitrary accuracy by both quantum-enhanced classification algorithms: VQC and QSVM, which are reviewed in detail in the Methods section. Therefore, the resulting classification models solve the k-FORRELATION problem in the PROMISEBQP setting and can represent any algorithm to solve all PROMISEBQP problems. In other words, we show that these quantum-enhanced classification algorithms are of PROMISEBQP-complete expressive power.

k-FORRELATION classification data set
We formulate a classification problem with the same complexity as the k-FORRELATION problem. Generally, given a promise problem Π = (Π + , Π − ), one can obtain a data set D = fx i ,y i g i2f1,...,mg by encoding m = m + + m − instances from Π into input vectors x i where the m + instances sampled from Π + are labeled with class y i = + 1 whereas the m − instances sampled from Π − are labeled with class y i = − 1. Deriving a data set based on the k-FORRELATION problem is not straightforward since the problem instances Π + ∪ Π − consist of ktuples of Boolean functions with n-bit inputs for which the description length to encode an instance generally grows exponentially in n. Specifically, an arbitrary n-bit Boolean function needs 2 n bits to encode the evaluation outcome for the 2 n possible inputs. Since a k-FORRELATION instance incorporates k such functions, the resulting data set would have dimensionality k2 n .
We use the restriction that each Boolean function f i depends on at most three input bits as allowed for k-FORRELATION to remain PROMISEBQPcomplete as long as the condition is fulfilled that at least one function depends on exactly three bits 26 . More specifically, each function can be restricted to be either constant f i (x) = 1 or of the form f i ðxÞ = ðÀ1Þ C i ðxÞ where C i (x) is a product of at most three bits. This enables one to encode a k-FORRELATION instance using up to three indices per function f i indicating the input bits involved in the product C i (x) or none indicating the constant function f i (x) = 1. We propose an explicit and practically effective multi-hot encoding scheme. Each function f i can be represented by an n-dimensional binary vector where a 1 in the j-th component indicates that the j-th input bit x j is incorporated in the product C i (x). The constant function f i (x) = 1 can be encoded as the zero vector. For example, with n = 3, the k = 3 Boolean functions f 1 ðxÞ = ðÀ1Þ x 1 x 3 , f 2 (x) = + 1 and f 3 ðxÞ = ðÀ1Þ x 2 would be encoded as x = (1, 0, 1, 0, 0, 0, 0, 1, 0) ⊤ . The resulting encoding of a k-FORRELATION instance and, therefore, the data dimensionality is kn, which is linear in k and, since k = poly(n), polynomial in n instead of exponential in n.
Aaronson and Ambainis 26 established the quantum algorithm to solve the k-FORRELATION problem with a constant query complexity by encoding the Boolean functions f i into unitary transformations with U f i |xi = f i ðxÞ|xi 8x 2 f0,1g n , which are diagonal in the computational basis, and applying them successively to the initial state |0i n with leading and subsequent Hadamard gates (H). The full quantum circuit can be represented as ð2Þ with the product C(x) comprising one, two and three bits induces Z, controlled-Z and controlled-controlled-Z gates, respectively, which causes a relative phase-flip conditioned on the values of up to three qubits 27 . In the final state U F |0 n , Φ f 1 ,...,f k is equal to the amplitude of state |0i n and can be, therefore, estimated by measurements in the computational basis to decide the k-FORRELATION problem.
We use the feature map |ΦðxÞ = U ΦðxÞ |0i n = U FðxÞ |0i n where U F(x) is defined by Eq. (2) under the k Boolean functions encoded in the This includes the discrete logarithm decision problem DLP 1/2 (red square) and (explicit) k-FORRELATION promise problem (red star). We use the following established, but not yet proven, assumptions: DLP 1/2 in NP, P ≠ NP, P ≠ BQP ( ⇒ existence of quantum advantage), NP-complete is outside BQP, (PROMISE)BQPcomplete is outside NP.
data sample x. We show that when used for VQC and for kernel construction in QSVM, this feature map leads to classification models that predict the correct class associated with the k-FORRELATION instance encoded in the data sample x. This classification can be made arbitrarily accurate by increasing the number of measurements estimating the probability of |0i n and is perfect given the exact measurement probability.

k-FORRELATION training data
We now show how to generate positive and negative training samples x + and x − of a classification problem for VQC and QSVM such that the quantum state |Φðx ± Þ = U Fðx ± Þ |0i n produced by circuit (2) in the feature map or quantum kernel corresponds to the positive class sample if all qubits are in state |0i and the negative class sample if they are in another computational basis state |zi with 0 < z < 2 n . To do this, we use the following theorem, which is proven in the Methods section: First, we show how to obtain a positive sample x + such that the initial state is preserved under circuit (2), i.e., U Fðx + Þ |0i n = |0i n . For odd k Boolean functions, circuit (2) includes k + 1 Hadamard gates, an even number. For all f i (x) = + 1, the initial state is preserved since U f i = I and the resulting pairs of successive Hadamard gates annihilate. To fulfill the condition that at least one Boolean function must depend on exactly three bits, we choose, without loss of generality, the first and third Boolean functions to be f 1 ðxÞ = f 3 ðxÞ = ðÀ1Þ x i x j x l . With this choice, The positive sample x + encoding these functions gives U Fðx + Þ |0i n = |0i n . Second, we proceed with generating a negative sample x − for which circuit (2) maps |0i n to a different computational basis state, i.e., U Fðx À Þ |0i n = |zi with 0 < z < 2 n . Observe that the unitary U f i with f i ðxÞ = ðÀ1Þ x j implements a Pauli-Z gate, which resolves to the Pauli-X gate when sandwiched by Hadamard gates HZH = X. This flip in qubit j transforms from the initial to another computational basis state |zi with z j = 1. Without loss of generality, we fix i = 1 and choose a subsequent f 2 (x) fulfilling the three-qubit dependence condition for PROMISEBQP-completeness so that all the following k − 1 Hadamard gates, an even number, pairwise annihilate when the remaining l > 2 functions are constant f l (x) = 1. Thus, f 2 (x) might only cause a global phase-flip on |zi, which can be ignored, and preserves the non-zero basis state of qubit j such that U Fðx À Þ |0i n = |2 jÀ1 i≠|0i.

Universal expressiveness of VQC
We first present the proof for VQC. The VQC model 5 uses a feature map to encode the input data x into an n-qubit quantum state |ΦðxÞ = U ΦðxÞ |0i n followed by a parameterized quantum circuit W(θ). A decision rule, involving an additional bias term b ∈ [ − 1, 1], enables classification by estimating the binary measurement probability to classify x as positive if or negative otherwise.
Proof. We use proof by reduction where our goal is to find the decision rule (5) to predict class +1 for each instance of the k-FORRELATION problem if and only if it is positive x ∈ Π + . We start with a data sample x that encodes the functions f 1 , …, f k and note that the choice of k-FORRELATION feature map U Φ(x) = U F(x) , observable M + 1 = |0i n 0 h | n and parameters θ such that W(θ) = I leads to For the two possible classes for a data sample x, two bounds to b can be derived as follows: • If x belongs to class + 1: Φ f 1 ,...,f k ≥ 3=5 holds and, therefore, |Φ f 1 ,...,f k | ≥ ð3=5Þ 2 = 9=25, which, when inserted into the decision rule (5), yields This only holds if b is chosen to be greater than − 7/25. • If x belongs to class − 1: Φ f 1 ,...,f k ≤ 1=100 holds and, therefore, |Φ f 1 ,...,f k | ≤ ð1=100Þ 2 = 1=10000. As the decision rule (5) must be violated, i.e., p +1 (x) < (1 − b)/2 for a negative sample x, a second condition can be derived as This only holds if b is chosen to be less than 4999/5000.

Universal expressiveness of QSVM
We now present the proof for QSVM. The QSVM approach uses a quantum computer to estimate the kernel function which is then used when solving the SVM dual problem 5 classically: The decision rule for an unseen (test) data sample s, involving an additional bias term b ∈ [ − 1, 1], is then Proof. We use proof by reduction to show that QSVM can have PROMISEBQP-complete expressive power. The constraints of the dual optimization problem in Eq. (11) imply that at least two training samples, one from each class, must be provided. Therefore, we consider m = 2 training samples and choose the positive training sample x 1 = x + with y 1 = + 1 and the negative training sample x 2 = x − with y 2 = − 1 as Article https://doi.org/10.1038/s41467-023-36144-5 defined above. The equality constraint in Eq. (11) yields 0 = α 1 y 1 + α 2 y 2 = α 1 À α 2 () α 1 = α 2 : We set α = α 1 = α 2 , which simplifies the dual optimization problem to one-dimensional optimization constrained on the interval 0 ≤ α ≤ C. Since [0, C] is a closed and bounded (i.e., compact) interval and the objective function is concave, the Weierstraß' extreme value theorem guarantees a maximum on this interval. We thus consider α to be the optimal solution, which is guaranteed to be non-negative and can be determined in closed-form in terms of the kernel function evaluated at the two training samples k(x 1 , x 2 ). As shown earlier, the two training samples produce U Fðx + Þ |0i n = |0i n and U Fðx À Þ |0i n = |zi with z ≠ 0 n when the k-FORRELATION feature map using circuit (2) is applied. Under using the k-FORRELATION feature map to construct the kernel, the prediction mapping in Eq. (12) of QSVM for (test) data sample s can now be simplified as mðsÞ = sign α kðx + ,sÞ À kðx À ,sÞ Here, the two required quantum kernel function estimates correspond to the probabilities to observe the bit-strings 0 n and z in the state produced by the k-FORRELATION quantum circuit U FðsÞ |0i n upon measurement in the computational basis. For the two possible cases ± 1 of a test sample s, two bounds can be derived for the argument in Eq. (15): ∘ If s belongs to class + 1: The measurement probability |h0 n |U FðsÞ |0 n i| 2 is the absolute squared forrelation quantity |Φ f 1 ,...,f k | 2 corresponding to the k-FORRELATION instance encoded in s, which is |h0 n |U FðsÞ |0 n i| 2 ≥ ð3=5Þ 2 in this case. Since the probabilities have to add up to one, every other n-bit bit-string z ≠ 0 n can only be observed with a probability of at most 1 − (3/5) 2 = 16/25, i.e., | z h |U FðsÞ |0 n | 2 ≤ 16=25. These observations yield a lower bound of Inserting this bound into m(s), we see that it evaluates to m(s) = + 1 provided b is chosen to be greater than 7α/25.
Thus, setting b 2 7 25 α, 4999 5000 α À Á guarantees the correct evaluation of the classification mapping m(s) for both cases. Again, the existence of b that yields the SVM separating the two classes was not a priori guaranteed. That such an interval exists ensures that QSVM has PROMISEBQP-complete expressive power. □

k-FORRELATION fixed ansatz
Finally, we show that circuit (2) used in the feature map or quantum kernel can be implemented using a parameterized quantum circuit with a fixed ansatz, which is typically used in QML. With a single Boolean function f i in the multi-hot encoding x, the indices j ∈ {1, …, n} where x j = 1 determine the target and control qubits of Z gates. To obtain a fixed ansatz, all possible qubit combinations to apply Z gates, controlled-Z gates and controlled-controlled-Z gates in (2) need to be covered. There are n 1 À Á = n 2 OðnÞ, n 2 À Á = nðn À 1Þ=2 2 Oðn 2 Þ, n 3 À Á = nðn À 1Þðn À 2Þ=6 2 Oðn 3 Þ possible qubit choices, respectively, due to the gate symmetry 27 . Instead of a (controlled-) Z gate, a (controlled) rotation about the Z axis R Z (λ) by angle parameter λ can be applied as it is equivalent to identity if λ = 0 and to the (controlled-) Z gate if λ = π. For a controlled rotation gate applied to J ⊆ {1, …, n} qubits, the sample x determines λ as which gives λ = 0 in all (controlled) rotation gates except λ = π for the one that implements f i encoded in x.
For k functions, the fixed ansatz requires Oðkn 3 Þ gates. This shows that the expressiveness of VQC and QSVM proven here can be achieved using parameterized quantum circuits with fixed ansatz of polynomial depth since k = poly(n). This result is important considering that VQC and QSVM are generally implemented using circuits with fixed ansatz [5][6][7] . However, embedding the data directly through circuit (2) by applying a single (controlled) Z gate to the respective qubits, which is no longer a fixed ansatz, results in shallower circuits of depth O(k).

Discussion
The present work demonstrates that the feature map of VQC and the quantum kernels of QSVM can be used to solve the classification problem with the complexity of the k-FORRELATION problem that has previously been proven to be PROMISEBQP-complete. This means that it is possible to design the feature map of VQC and the quantum kernel of QSVM for any classification problem derived from any promise problem in PROMISEBQP. Because PROMISEBQP includes all decision problems in BQP as a special case, our results imply that it is possible to design the feature map of VQC and the quantum kernel of QSVM that solve any classification problem derived from any decision problem in BQP. If BQP ≠ BPP (classical bounded error probabilistic polynomial time), as required for exponential speed-up of quantum computing to exist, our results imply that VQC and QSVM must have quantum advantage over classical classifiers.
According to Havlíček et al. 5 , every problem that can be solved by VQC can also be solved by QSVM, but the reverse does not generally hold. This connection is detailed in Schuld 7 and briefly outlined here. QSVM can be seen as VQC with an optimal measurement, i.e., W(θ) with an optimal ansatz and parameters, since W(θ) effectively changes the measurement basis. Generally, a fixed ansatz in W(θ) requires Oð2 2 n Þ degrees of freedom to express arbitrary measurements. In QSVM, this reduces to an m-dimensional optimization problem as-in the SVM dual view-measurements (↔ separating hyperplane) become expansions in the training data (↔ support vectors). Due to the concavity in Eq. (10), this is optimally solved given the kernel values k(x i , x j ) for all pairs of training data points. Therefore, QSVM is guaranteed to find better or equally good solutions than VQC. In the present work, we show that both VQC and QSVM can solve a classification problem based on the k-FORRELATION problem, which implies that VQC and QSVM have an equivalent (universal) expressiveness from a computational complexity theory point of view.

Quantum-enhanced classification algorithms
Two most common, and related, approaches to solving classification problems with quantum computers are VQC and QSVM 5 , schematically depicted in Fig. 2. The VQC model first uses a feature map to encode the input data x into an n-qubit quantum state by a unitary transformation of the initial state |0i n : |ΦðxÞ = U ΦðxÞ |0i n . Subsequently, a parameterized quantum circuit W(θ) transforms the states to enable classification by a quantum measurement. The parameters θ and an additional bias term b ∈ [ − 1, 1] are learned by classical optimization. A binary measurement probability is estimated to classify x as positive if or as negative otherwise under choosing two projectors with arbitrary but fixed coefficients h z ∈ { − 1, 1}. The QSVM approach uses a quantum computer to estimate the kernel function k(x i , x j ) that is then used in the dual problem 5 : The optimal solution is obtained classically by efficient quadratic optimization and determines the classification mapping of a (test) data sample s as as the measurement probability of the 0 n bit-string.

FORRELATION
The complexity classes such as P or BQP are for decision problems with inputs necessarily belonging to '+' or '-' instances. If inputs include a set that corresponds to neither '+' nor '-', the decision problems are generalized to become promise problems 28 . To make decisions, promise problems consider only inputs from the subsets corresponding to the '+/-' instances (i.e. inputs that are promised to lead to a '+' or '-' decision). An example of a promise problem is the FORRELATION problem introduced in Aaronson 29 , and refined and extended in Aaronson and Ambainis 26 . This problem considers two Boolean functions f, g: with x Á y = P n i = 1 x i y i determines the amount of correlation between f and the Fourier transform of g, i.e., the "forrelation" of f and g. Analogously to correlation, one can say that f and g are "forrelated" once the value Φ f,g is large or not if it is small.
The FORRELATION problem is solvable with a quantum algorithm 29 using a single query with error probability of 2/5, which can be arbitrarily reduced by increasing the query complexity by a constant factor. Therefore, a quantum algorithm exists that solves the problem with error probability ≤1/3 with a constant number of queries while the query implementing circuit remains polynomial, which makes it a PROMISEBQP problem 26 . As any decision problem is a trivial special case of a more general promise problem, the class of PROMISEBQP problems includes BQP entirely, as depicted in Fig. 1.
with x Á y = P n i = 1 x i y i leads to a promise problem: Here, Π ± are the sets of ± problem instances with Π + \ Π À = +.
This definition generally allows the evaluation of the functions f 1 , …, f k by oracle queries. Furthermore, for explicit descriptions, which we assume in this work, Aaronson and Ambainis 26 proved the following theorem: Theorem 2 (PROMISEBQP-completeness). If f 1 , …, f k are described explicitly (e.g., by circuits to compute them), and k = poly(n), then k-FORRELATION is BQP-complete. Also showed that this still holds when the functions are restricted to depend on at most three input bits of the form f i ðxÞ = ðÀ1Þ C i ðxÞ where C i (x) is a product of at most 3 input bits, or be chosen constant f i (x) = 1, while at least one f i (x) must depend on exactly 3 bits in x. Note the crucial difference: k-FORRELATION (under the stated conditions) is not only a PROMISEBQP problem but a PROMISEBQP-complete problem.

odd-k-FORRELATION
Theorem 1 is used for the construction of the data set in the present work. It is restated and proven in the following: Theorem 1 (odd-k-FORRELATION). Explicit k-FORRELATION remains PROMISEBQP-complete when k is restricted to odd k ≥ 3.
Proof. By construction, odd-k-FORRELATION is a special case of k-FORRELATION, which trivially implies that odd-k-FORRELATION is in PROMISEBQP. For PROMISEBQP-completeness, it remains to show that odd-k-FORRELATION is PROMISEBQP-hard via a proof by reduction: we provide a polynomial mapping from every instance of k-FORRELATION to an instance of odd-k-FORRELATION that preserves the forrelation value Φ, which indicates that odd-k-FORRELATION is at least as difficult as k-FORRELATION.
If k is odd in an instance of k-FORRELATION, it is trivially an instance of odd-k-FORRELATION. If k is even in an instance of k-FORRELATION, we add 4⌈n/2⌉ − 1 Boolean functions resulting in odd k + 4⌈n/2⌉ − 1. The additional functions are chosen such that they are either constant f(x) = + 1 or of the form f ðxÞ = ðÀ1Þ x i x j with i, j ∈ {1, …, n}, fulfilling the necessary conditions. We show that Φ f 1 ,...,f k = Φ f 1 ,...,f k + 4dn=2eÀ1 as follows.
The proof of Theorem 25 in Aaronson and Ambainis 26 uses a gadget applied to two qubits i and j with i ≠ j that converts an even number of H ⊗2 gates into an odd number. Namely, using three controlled-Z gates (CZ), which implement f ðxÞ = ðÀ1Þ x i x j . We apply this gadget successively to ⌈n/2⌉ non-overlapping pairs of qubits to reproduce the final layer of Hadamard gates. The gadgets require 3⌈n/2⌉CZ gates and ⌈n/2⌉ − 1 constant functions, so that every fourth of the additional functions produces an identity between two gadgets. In total, an odd number of Boolean functions f k+1 , …f k+4⌈n/2⌉−1 is added. Obviously, this extends the problem instance from an even to an odd number of Boolean functions, while keeping the circuit equivalent (under SWAP operations) to the original one defined by even k Boolean functions. In other words, the value Φ is preserved since SWAP operations do not affect the amplitude of |0i n . For the pairwise application of the 2-qubit gadgets in the case of an odd number of qubits n, one can introduce an ancilla qubit in |0i. The final result remains unaffected as this (n + 1)-th qubit ends up in |0i and is, therefore, not entangled.

Data availability
Data sharing is not applicable to this article as no data sets were generated or analyzed during the current study.